probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...

and statistics, a compound probability distribution (also known as a

mixture distribution In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collectio ...

or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a

scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...

, the resulting mixture is also called a scale mixture. The compound distribution ("unconditional distribution") is the result of marginalizing (integrating) over the ''latent'' random variable(s) representing the parameter(s) of the parametrized distribution ("conditional distribution").

Definition

A compound probability distribution is the probability distribution that results from assuming that a random variable

X

is distributed according to some parametrized distribution

F

with an unknown parameter

\theta

that is again distributed according to some other distribution

G

. The resulting distribution

H

is said to be the distribution that results from compounding

F

with

G

. The parameter's distribution

G

is also called the mixing distribution or latent distribution. Technically, the ''unconditional'' distribution

H

results from '' marginalizing'' over

G

, i.e., from integrating out the unknown parameter(s)

\theta

. Its

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...

is given by: :

p_H(x) =

The same formula applies analogously if some or all of the variables are vectors. From the above formula, one can see that a compound distribution essentially is a special case of a

marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the varia ...

: The ''

joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...

'' of

x

and

\theta

is given by

p(x,\theta)=p(x, \theta)p(\theta)

, and the compound results as its marginal distribution:

. If the domain of

\theta

is discrete, then the distribution is again a special case of a

Properties

The compound distribution

H

will depend on the specific expression of each distribution, as well as which parameter of

F

is distributed according to the distribution

G

, and the parameters of

H

will include any parameters of

G

that are not marginalized, or integrated, out. The support of

H

is the same as that of

F

, and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist. The compound distribution's first two moments are given by:

bigr.html"_;"title=".html"_;"title="operatorname_F[X">\thetabigr">.html"_;"title="operatorname_F[X">\thetabigr

_\operatorname_H(X)_=_\operatorname_G\bigl \thetabigr.html"_;"title=".html"_;"title="operatorname_F[X">\thetabigr">.html"_;"title="operatorname_F[X">\thetabigr _\operatorname_H(X)_=_\operatorname_G\bigl[\operatorname_F(X">\theta)\bigr+_\operatorname_G\bigl(\operatorname_F[X.html" ;"title="operatorname_F(X.html" ;"title="">\thetabigr.html" ;"title=".html" ;"title="operatorname_F[X">\thetabigr">.html" ;"title="operatorname_F[X">\thetabigr

\operatorname_H(X) = \operatorname_G\bigl[\operatorname_F(X">\theta)\bigr+ \operatorname_G\bigl(\operatorname_F[X">\thetabigr)

(Law of total variance) If the mean of

F

is distributed as

G

, which in turn has mean

\mu

and variance

\sigma^2

the expressions above imply

= \operatorname_G

theta Theta (, ; uppercase: Θ or ; lowercase: θ or ; grc, ''thē̂ta'' ; Modern: ''thī́ta'' ) is the eighth letter of the Greek alphabet, derived from the Phoenician letter Teth . In the system of Greek numerals, it has a value of 9. Gr ...

= \mu and

\operatorname_H(X) = \operatorname_F(X, \theta) + \operatorname_G(Y) = \tau^2 + \sigma^2

, where

\tau^2

is the variance of

F

Proof

let

F

and

G

be probability distributions parameterized with mean a variance as

\begin
x &\sim \mathcal(\theta,\tau^2) \\
\theta &\sim \mathcal(\mu,\sigma^2)
\end

then denoting the probability density functions as

f(x, \theta) = p_F(x, \theta)

and

g(\theta) = p_G(\theta)

respectively, and

h(x)

being the probability density of

H

we have

g(\theta) d\theta \end

and we have from the parameterization

\mathcal

and

\mathcal

that

&= \int_F x f(x, \theta)dx = \theta \\ \operatorname_G

&= \int_G \theta g(\theta)d\theta = \mu \end and therefore the mean of the compound distribution

\operatorname_H = \mu

as per the expression for its first moment above. The variance of

H

is given by

^2

, and

= \int_F x^2 h(x)dx &= \int_F x^2 \int_G f(x, \theta) g(\theta) d\theta dx \\ &= \int_G g(\theta)\int_F x^2 f(x, \theta) dx\ d\theta \\ &= \int_G g(\theta)(\tau^2+\theta^2)d\theta\\ &= \tau^2\int_G g(\theta)d\theta+\int_Gg(\theta)\theta^2d\theta\\ &= \tau^2+(\sigma^2+\mu^2), \end

given the fact that

\int_F x^2 f(x\mid \theta) dx=\operatorname_F^2\mid \theta \operatorname_F(X\mid\theta)+(\operatorname_F \mid \theta^2

and

\operatorname_G(\theta) + (\operatorname_G

^2 . Finally we get

^2 \\ &= \tau^2 + \sigma^2 \end

Applications

Testing

Distributions of common

test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specifi ...

s result as compound distributions under their null hypothesis, for example in Student's t-test (where the test statistic results as the ratio of a

normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...

and a chi-squared random variable), or in the

F-test An ''F''-test is any statistical test in which the test statistic has an ''F''-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model ...

(where the test statistic is the ratio of two chi-squared random variables).

Overdispersion modeling

Compound distributions are useful for modeling outcomes exhibiting

overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...

, i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...

, whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its

rate parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...

, implemented via a

gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma d ...

, which results in a marginal negative binomial distribution. This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a binomial distribution may be generalized to allow for additional variability by compounding it with a

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...

for its success probability parameter, which results in a

beta-binomial distribution In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of B ...

Bayesian inference

Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in Bayesian inference, compound distributions arise when, in the notation above, ''F'' represents the distribution of future observations and ''G'' is the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...

of the parameters of ''F'', given the information in a set of observed data. This gives a

posterior predictive distribution Posterior may refer to: * Posterior (anatomy), the end of an organism opposite to its head ** Buttocks, as a euphemism * Posterior horn (disambiguation) * Posterior probability The posterior probability is a type of conditional probability that r ...

. Correspondingly, for the

prior predictive distribution Prior (or prioress) is an ecclesiastical title for a superior in some religious orders. The word is derived from the Latin for "earlier" or "first". Its earlier generic usage referred to any monastic superior. In abbeys, a prior would be l ...

, ''F'' is the distribution of a new data point while ''G'' is the

prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...

of the parameters.

Convolution

Convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...

of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random

location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...

for the other summand.

Computation

Compound distributions derived from

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary. Compound distributions may relatively easily be investigated using

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...

s, i.e., by generating random samples. It is often easy to generate random numbers from the distributions

p(\theta)

as well as

p(x, \theta)

and then utilize these to perform '' collapsed Gibbs sampling'' to generate samples from

p(x)

. A compound distribution may usually also be approximated to a sufficient degree by a

using a finite number of mixture components, allowing to derive approximate density, distribution function etc.

Parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...

(

maximum-likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

or maximum-a-posteriori estimation) within a compound distribution model may sometimes be simplified by utilizing the EM-algorithm.

Examples

* Gaussian scale mixtures: ** Compounding a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

with

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

distributed according to an inverse gamma distribution (or equivalently, with

precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...

distributed as a

) yields a non-standardized Student's t-distribution. This distribution has the same symmetrical shape as a normal distribution with the same central point, but has greater variance and

heavy tail In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distrib ...

s. ** Compounding a Gaussian (or normal) distribution with variance distributed according to an exponential distribution (or with standard deviation according to a

Rayleigh distribution In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribut ...

) yields a Laplace distribution. More generally, compounding a Gaussian (or normal) distribution with variance distributed according to a

yields a

variance-gamma distribution The variance-gamma distribution, generalized Laplace distribution or Bessel function distribution is a continuous probability distribution that is defined as the normal variance-mean mixture where the mixing density is the gamma distribution. The ...

. ** Compounding a Gaussian distribution with variance distributed according to an exponential distribution whose rate parameter is itself distributed according to a

yields a

Normal-exponential-gamma distribution In probability theory and statistics, the normal-exponential-gamma distribution (sometimes called the NEG distribution) is a three-parameter family of continuous probability distributions. It has a location parameter \mu, scale parameter \theta a ...

. (This involves two compounding stages. The variance itself then follows a

Lomax distribution The Lomax distribution, conditionally also called the Pareto Type II distribution, is a heavy-tail probability distribution used in business, economics, actuarial science, queueing theory and Internet traffic modeling. It is named after K. ...

; see below.) ** Compounding a Gaussian distribution with standard deviation distributed according to a (standard) inverse uniform distribution yields a

Slash distribution In probability theory, the slash distribution is the probability distribution of a standard normal variate divided by an independent standard uniform variate. In other words, if the random variable ''Z'' has a normal distribution with zero mean an ...

. * other Gaussian mixtures: ** Compounding a Gaussian distribution with

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...

distributed according to another Gaussian distribution yields (again) a Gaussian distribution. ** Compounding a Gaussian distribution with

distributed according to a shifted exponential distribution yields an

exponentially modified Gaussian distribution In probability theory, an exponentially modified Gaussian distribution (EMG, also known as exGaussian distribution) describes the sum of independent normal and exponential random variables. An exGaussian random variable ''Z'' may be expressed ...

. * Compounding a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...

with probability of success

p

distributed according to a distribution

X

that has a defined expected value yields a Bernoulli distribution with success probability

E /math>. An interesting consequence is that the dispersion of X does not influence the dispersion of the resulting compound distribution.  
* Compounding a binomial distribution with probability of success distributed according to a

yields a

. It possesses three parameters, a parameter

n

(number of samples) from the binomial distribution and

shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. t ...

\alpha

and

\beta

from the beta distribution. * Compounding a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of wh ...

with probability vector distributed according to a

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bold ...

yields a

Dirichlet-multinomial distribution In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribut ...

. * Compounding a

with

distributed according to a

yields a negative binomial distribution. * Compounding a

with rate parameter distributed according to a exponential distribution yields a

geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \; * ...

. * Compounding an exponential distribution with its

distributed according to a

yields a

. * Compounding a

with inverse scale parameter distributed according to another

yields a three-parameter

beta prime distribution In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution. Definitions ...

. * Compounding a

half-normal distribution In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution. Let X follow an ordinary normal distribution, N(0,\sigma^2). Then, Y=, X, follows a half-normal distribution. Thus, the ha ...

with its

distributed according to a

yields an exponential distribution. This follows immediately from the Laplace distribution resulting as a

scale mixture; see above. The roles of conditional and mixing distributions may also be exchanged here; consequently, compounding a

with its scale parameter distributed according to a

''also'' yields an exponential distribution. * A Gamma(k=2,θ) - distributed random variable whose

θ again is uniformly distributed marginally yields an exponential distribution.

Similar terms

The notion of "compound distribution" as used e.g. in the definition of a

Compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. ...

Compound Poisson process A compound Poisson process is a continuous-time (random) stochastic process with jumps. The jumps arrive randomly according to a Poisson process and the size of the jumps is also random, with a specified probability distribution. A compound Poisso ...

is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g.

Bayesian hierarchical modeling Bayesian hierarchical modelling is a statistical model written in multiple levels (hierarchical form) that estimates the parameters of the posterior distribution using the Bayesian method.Allenby, Rossi, McCulloch (January 2005)"Hierarchical Bayes ...

. The special case for compound probability distributions where the parametrized distribution

F

is the

is also called

mixed Poisson distribution A mixed Poisson distribution is a Univariate distribution, univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a P ...

Definition

Properties

Proof

Applications

Testing

Overdispersion modeling

Bayesian inference

Convolution

Computation

Examples

Similar terms

See also

References

Further reading